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Abstract 

Free-discontinuity problems describe situations where the solution of interest is 
defined by a function and a lower dimensional set consisting of the discontinuities 
of the function. Hence, the derivative of the solution is assumed to be a 'small' 
function almost everywhere except on sets where it concentrates as a singular mea- 
sure. This is the case, for instance, in crack detection from fracture mechanics or 
in certain digital image segmentation problems. If we discretize such situations for 
numerical purposes, the free-discontinuity problem in the discrete setting can be 
re-formulated as that of finding a derivative vector with small components at all 
but a few entries that exceed a certain threshold. This problem is similar to those 
encountered in the field of 'sparse recovery', where vectors with a small number 
of dominating components in absolute value are recovered from a few given linear 
measurements via the minimization of related energy functionals. Several iterat! 
ive thresholding algorithms that intertwine gradient-type iterations with thresh- 
olding steps have been designed to recover sparse solutions in this setting. It is 
natural to wonder if and/or how such algorithms can be used towards solving dis- 
crete free-discontinuity problems. The current paper explores this connection, and, 
by establishing an iterative thresholding algorithm for discrete free-discontinuity 
problems, provides new insights on properties of minimizing solutions thereof. 
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1 Introduction 

In the following introductory sections, we will establish the mathematical setting of 
the paper, and review the features of free-discontinuity problems that are relevant to 
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the current discussion. 

1.1 Free-discontinuity problems: the Mumford-Shah functional 

The terminology 'free-discontinuity problem' was introduced by De Giorgi [22J to in- 
dicate a class of variational problems that consist in the minimization of a functional, 
involving both volume and surface energies, depending on a closed set K C R , and a 
function u on M d usually smooth outside of K. In particular, 

• K is not fixed a priori and is an unknown of the problem; 

• K is not a boundary in general, but a free-surface inside the domain of the 
problem. 

The best-known example of a free-discontinuity problem is the one modelled by the 
so-called Mumford-Shah functional [30], which is defined by 

J{u,K):= [ [\Vu\ 2 + a(u- g) 2 ] dx + PH d ~ l (K n Q) . 
Jn\K 

The set Q is a bounded open subset of M. d , a, j3 > are fixed constants, and g £ L°°(£l). 
Here 7i N denotes the ^-dimensional Hausdorff measure. Throughout this paper, the 
dimension of the underlying Euclidean space M rf will always be d = 1 or d = 2. In the 
context of visual analysis, g is a given noisy image that we want to approximate by the 
minimizing function u G W l ' 2 {Q\K); the set K is simultaneously used in order to seg- 
ment the image into connected components. For a broad overview on free-discontinuity 
problems, their analysis, and applications, we refer the reader to [2]. 

If the set K were fixed, then the minimization of J with respect to u would be a 
relatively simple problem, equivalent to solving the following system of equations: 

Au = a(u — g), in $7 \ K, 
du 

— = 0, on 9fiU K, 

ov 

where v is the outward-pointing normal vector at any x £ dQ U K. Therefore the 
relevant unknown in free-discontinuity problems is the set K. Ensuring the existence 
of minimizers (u, K) of J is a challenging problem because there is no topology on the 
closed sets that ensures 

(a) compactness of minimizing sequences and 

(b) lower semicontinuity of the Hausdorff measure. 

Indeed, it is well-known, by the direct method of calculus of variations [201 Chap- 
ter 1], that the two previous conditions ensure the existence of minimizers. How- 
ever, the problem becomes more manageable if we restrict our domain to functions 
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u G BV(£l) n Vl^ 1 ' 2 (0 \ K), and make the identification K = S u where S u is the 
well-defined discontinuity set of u. In this case, we need to work only with a topol- 
ogy on the space BV(£l) of bounded variation, and no set topology is anymore required. 

Unfortunately the space BV(£l) is 'too large'; it contains Cantor-like functions whose 
approximate gradient vanishes, Vu = 0, almost everywhere, and whose discontinuity 
set has measure zero, H d ~ l (S u ) = 0. As these functions are dense in L 2 (J7), the prob- 
lem is trivialized; see [1] for details. 

Nevertheless, it is possible to give a meaningful formulation of the functional J if 
we exclude such functions and restrict J to the space SBV(Q) constituted of BV- 
functions with vanishing Cantor part. If we assume again K = S u , the solution can be 
recast as the minimization of 



The existence of minimizers in SBV for the functional ([I]) was established by Ambrosio 
on the basis of his fundamental compactness theorem in [3], see also [H Theorem 4.7 
and Theorem 4.8]. 

1.2 T-convergence approximation to free-discontinuity problems 

The discontinuity set S u of a SBV- function u is not an object that can be easily 
handled, especially numerically. This difficulty gave rise to the development of approx- 
imation methods for the Mumford-Shah functional and its minimizers where sets are 
no longer involved, and instead substituted by suitable indicator functions. In order 
to understand the theoretical basis for these approximations, we need to introduce the 
notion of T-convergence, which is today considered one of the most successful notions 
of 'variational convergence'; we state only the definition of T-convergence below, but 
refer the reader to [20^ [T3] for a broad introduction. 

Definition 1.1. Let (X,d) be a metric spac$\ and let f,f n '>X—> [0, oo] be functions 
for n £ N. We say that (/ n )neN ^-converges to f if the following two conditions are 
satisfied: 

i) for any sequence (x n ) n C X converging to x, 



ii) for any x E X, there exists a sequence (x n ) n C X converging to x such that 



1 Observe that by [201 Proposition 8.7] suitable bounded sets X endowed with the weak topology 
induced by a larger Banach space are indeed metrizable, so this condition is not that restrictive. 




(1) 



liminf f n (x n ) > f{x); 



limsup/ n (x n ) < f(x). 



ii 
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One important consequence of Definition 11.11 is that if a sequence of functionals 
f n T-converges to a target functional /, then the corresponding minimizers of f n also 
converge to minimizers of /, see [20, Corollary 7.30]. 

We define now 

F £ (u,v):=J [v 2 \Vu\ 2 + a(u - g) 2 ] + ^e\Vv\ 2 + ^—^-^j dx (2) 

over the domain (L 2 (Q)) 2 , along with the related functional 

/ F £ {u,v) , if v e W x ' 2 (n), uv e WA 2 (0), and < v < 1, 
Je(u ' uj: "\ oo , else. {6} 

Note that at the minimizer (u, v) of J £ , the function < v < 1 tends to indicate the 
discontinuity set S u of the functional dU) as e — > 0. In [5J Ambrosio and Tortorelli 
proved the following r-approximation result: 

Theorem 1.2 (Ambrosio- Tortorelli '90). For any infinitesimal sequence (£n)n? 
functional J' En (u,v) T-converges in (L 2 (f2)) 2 to the functional 

^ ' | oo , otherwise. ^ 
1.3 Discrete approximation 

In fact, the Mumford-Shah functional is the continuous version of a previous discrete 
formulation of the image segmentation problem proposed by Geman and Geman in [28J ; 
see also the work of Blake and Zisserman in [8]. Let us recall this discrete approach. 
For simplicity let d = 2 (as for image processing problems), Q = [0, l] 2 , and let 
u(hi, hj), G Z 2 be a discrete function defined on $7^ := Q n hZ 2 , for h > 0. 



u 



l :J 



Define Wh{t) = min{t , /3/h} to be the truncated quadratic potential, and 



(hi,hj)£Qi, 



+ h 2 W h 

{hi,hj)&Q, h 

+ ah 2 ^ (uij-gij) 2 . 
(hi,hj)en h 

Chambolle [16|. \T7\ gave formal clarification as to how the discrete functional 3 

approximates the continuous functional J of Ambrosio: discrete sequences can be 
interpolated by piecewise linear functions in such a way as to allow for discontinuities 
when the discrete finite differences of the sampling values are large enough. On the 
basis of this identification of discrete functions on and functions defined on the 
'continuous domain' f2, we have the following result: 
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Theorem 1.3 (Chambolle '95). The functional J ^j^jji r "-converges in 3(0,) (the space 
of B orel- measurable functions, which is metrizable, see \1 7f for details) to 



J cab (u)= f [\Vu\ 2 + a(u-g) 2 ]dx + (3C(S u ) 



Jn\s, 



as h — > 0, where C is the so-called 'cab-driver' measure defined below. 

Basically C measures the length of a curve only through its projections along hori- 
zontal and vertical axes; for a regular C 1 curve c = 7QO, 1]), with j(t) = (71 (i), 72^)) £ 
f2, we have 



The reason this anisotropic (or, direction dependent) measure appears, in place of the 
Hausdorff measure in the Mumford-Shah functional, is due to the approximation of 
derivatives by finite differences defined on a 'rigid' squared geometry. A discretization 
of derivatives based on meshes adapted to the morphology of the discontinuity indeed 
leads to precise approximations of the Mumford-Shah functional |18|. [12] . 

1.4 Free-discontinuity problems and discrete derivatives 

In the literature, several methods have been proposed to numerically approximate min- 
imizers of the Mumford-Shah functional \12\ \W[ \T7\ [29] . In particular, a relaxation 
algorithm, based essentially on alternated minimization of a finite element approxima- 
tion of the Ambrosio and Tortorelli functional ([3]) , leads to iterated solutions of suitable 
elliptic PDEs, where the differential part includes the auxiliary variable v which en- 
codes and indicates information about the discontinuity set. These implementations 
are basically finite dimensional approximations to the following algorithm: Starting 
with v^ ' = 1, iterate 



However, neither has a proof of convergence of this iterative process to its station- 
ary points been explicitly provided in the literature, nor have the properties of such 
stationary points been investigated, especially in case of genuine inverse problems (see 
the discussion in Subsection II ,4.3j) . 

In this paper, we take a different approach and investigate how minimization of the 
T-approximating discrete functionals ([5]) can be implemented efficiently by iterative 
thresholding on the discrete derivatives. Unlike the aforementioned approach, we will 
be able to provide a rigorous proof of convergence to stationary points, which coincide 
with local minimizers of the discrete Mumford-Shah functional. Moreover, we are 
able to characterize stability properties of such stationary points, and demonstrate the 
stability of global minimizers of the discrete Mumford Shah functional. 
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Let us recall: the solutions u of a free-discontinuity problem are supposed to be 
smooth out of a minimal ipersurface K. This means that the distributional deriva- 
tive of u is a 'small function' everywhere except on K where it coincides with a 
singular measure. In the discrete approximation ([5]), the vector of finite differences 
(wj) = { Ut ' 3+1 h M ' J ; u,+lj ft Ul ' J ) corresponds to a piecewise constant function that is 
small everywhere except for a few locations, corresponding to \wj\ > y//3/h, that ap- 
proximate the discontinuity set K. So, in terms of derivatives, solutions of @ are 
vectors having only few large entries. In the next section, we clarify how we can indeed 
work with just derivatives and forget the primal problem. 



1.4.1 The 1-D case 



Let us assume for simplicity that the dimension d = 1, the domain f2 = [0,1], and 
the parameters a = (3 = 1. Denote by Ui = u(hi) a discrete function defined on 
hi S {1^ := 0, n hZ, for h > 0; note that the vector (itj) E M n for n = [l/h\. In this 
setting, the discrete functional ([5]) reduces to 



= h Yj W h 

{hi)en h 

+ h £ ( 



Ui+l 



Ui 



II; 



(hi)en h 



where we recall that Whit) = min{i 2 , 1/h}. Since no geometrical anisotropy is now in- 
volved (d = 1), it is possible to show that this discrete functional T-converges precisely 
to the corresponding Mumford-Shah functional on intervals 



For (ui)hien h we define the discrete derivative as the matrix : 
maps (ui)hien h into ( Ui + 1 ~ Ui ) , given by 



on— 1 



/-ll 







-1 1 











that 



(5) 



\ -1 1 / 

It is not too difficult to show that 

u = D k D h u + c, 

where Dl is the pseudo-inverse matrix of (in the Moore-Penrose sense; note that 



D h maps 



pn— 1 



into W 1 and is an injective operator) and c is a constant vector which 



depends on u, and the values of its entries coincide with the mean value hY2hi&n h u i 
of u. Therefore, any vector u is uniquely identified by the pair (D^u, c). 
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Since constant vectors comprise the null space of Dh, the orthogonality relation (D' h DhU, c)/>n = 
holds for any vector u and any constant vector c. Here the scalar product (-, -)gn = 
Y2i u i v i is the standard Euclidean scalar product, which induces the Euclidean norm 

1 /2 

\\ u \\(-2 := i^i u i) • Using this orthogonality property, we have that 
\\u-g\\q = \\DlD h u- D\D h g + (c - c g )\\jn 

= \\ D l D hU- D l D h9\\q + \\c-Cg\\q 

Hence, with a slight abuse of notation, we can reformulate the original problem in 
terms of derivatives, and mean values, by 

Ji/Vh( z i c ) = H D h z ~ f\\% + Hc-c g \\q + h J2 m[n y Zi \ 2 >\\ 

i ^ ' 

where z = D^u and / = DlD^g. Of course at the minimizer u we have c = c g , since 
this term in Jxj^fh does not depend on z. Therefore, \\c — c s ||| does not play any role 
in the minimization and can be neglected. Once the minimal derivative vector z is 
computed, we can assemble the minimal u by incorporating the mean value of g as 
follows: 

u = D ] h z + c g . 



1.4.2 The 2-D case, discrete Schwartz conditions, and constrained opti- 
mization 

Let us assume now d = 2, $7 = [0, l] 2 , and again a = [3 = 1. Denote Uij = u(hi,hj), 
€ 1?, a discrete function defined on := Q, n hi?, n= \ l/h\, and 

(hi,hj)eQ h 

+ h 2 Yl w h ( Ut,3 \ — 

(hi,hj)en h 

In two dimensions, we have to consider the derivative matrix : I™ 2 — ► R 2n ( n_1 ) that 
maps the vector (uj + ^_i^ n ) := (ujj) to the vector composed of the finite differences in 
the horizontal and vertical directions u x and u y respectively, given by 

(ux)j+ n (i-i) ■= (ux)i,j ■= Ui+1 >i- Ui >i , i = 1, . . . , n - 1, j = 1, . . . , ra 
K)j+(n-i)(i-i) : = (u y )i,j ■= U ^ +1 ~ U ^ , i = l,...,n,j = 1, ... ,n- 1 

Note that its range R(Dh) C IR 2n ( n_1 ) is a (n 2 — l)-dimensional subspace because 
DhC = for constant vectors c £ 1" . Again, we have the differentiation-integration 




D h u :-- 



u.. 
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formula, given by 



D\D h u + c, 



where d\ is the pseudo-inverse matrix of (in the Moore-Penrose sense); note that 

Also, c is a constant vector that depends on u, 

(hi,hj)en h u i,j 



D k maps R(Dh) injectively into R 

and the values of its entries coincide with the mean value h 2 Ylru; u^^o. Uii of u. 



Proceeding as before and again with a slight abuse of notation, we can reformulate 
the original discrete functional ([5]) in terms of derivatives, and mean values, by 



J, 



l/Vh 



z,c) 



h 2 [\\D[z 



ft 



+ c- 



-g\\ f n< 



mm < z. 



*t,3 I 



, 1 



where z = D^u G R 2n ( ra l \ and / = D^D^g € R™ 2 . Of course c = c g is again assumed 
at the minimizer u, since this latter term in J^j^fg does not depend on z. However, in 

order to minimize only over vectors in R 2n ( n_1 ) that are derivatives of vectors in R" 2 , 
we must minimize <Ji/^/f l (z, c) subject to the constraint D^D^z = z. 



u i,j + l (U x )i,j + 1 U 1+ i ( j + i 



(%)i,j 



(Uy)i + 1, j 



(Ux) i, ; 



U i + 1, 



Figure 1: Compatibility conditions of derivatives in 2D. 

The 2n(n — 1) linearly independent constraints D^D^z = z are equivalent to the 
discrete Schwartz constraints^ 

{Uy)i,j + (Ux)i,j+1 = ( u y)i+l,j + (Ux)i,j, (6) 

that establish the equivalence of the length of the paths from m j to Uj+ij+i, whether 
one moves in vertical first and then in horizontal direction or in horizontal first and 
then in vertical direction (see Figure [1]) . 

In short, we arrive at the following constrained optimization problem: 

2 These discrete conditions correspond to the well-known Schwartz mixed derivative theorem for 
which d xy u — d yx u for any u £ C 2 (Q). 
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r Minimize J 1/VK (z) = h 2 [\\Tz - f\\ 2 n2 + min { \ Zi>j \ 2 , {} ] . 

(7) 

I subject to Qz = 0, 

for T = D^ h and Q = I — DhD^. Once the minimal derivative vector z is computed, 
we can assemble the minimal u by incorporating the mean value of g as follows: 

u = D j h z + c g . 

1.4.3 Regularization of inverse problems by means of the Mumford-Shah 
constraint 

The Mumford-Shah regularization term 

MS{u)= ( \Vu\ 2 + (3H d - 1 (S u ), (8) 
Ju\s u 

has been used frequently in inverse problems for image processing |23[ I32j. such as in- 
painting and tomographic inversion. Despite the successful numerical results observed 
in the aforementioned papers for the minimization of functionals of the type 

J{u) = a\\Ku-g\\ L 2 (m + MS{u), (9) 

where K : L 2 (Q) — > L 2 (Q) is a bounded operator which is not boundedly invertible, 
no rigorous results on existence of minimizers are currently available in the literature. 
Indeed, the Ambrosio compactness theorem [3] used for the proof of the case K = I 
does not apply in general. A few attempts towards using the regularization MS for in- 
verse problems in fracture detection appear in the work of Rondi [6'6 \ \'64\ [55] . although 
restrictive technical assumptions on the admissible discontinuities of the solutions are 
required. 



As one of the contributions to this paper, we show that discretizations of regularized 
functionals of the type (J9j) always have minimizers (see Theorem I2.2p . More precisely, 
these discretizations correspond to functionals of the form, 

J^/pftiu) := ah 2 \\Ku-g\\l+h 2 ^ 

(hi,hj)€U h 

(10) 

and we prove that such functionals admit minimizers. Note that the discrete Mumford- 
Shah approximation ([5J) can be written in this form. We go on to show that such min- 
imizers can be characterized by certain fixed point conditions, see Theorem 14.11 and 
Theorem 14.21 As a consequence of these achievements we can prove that global mini- 
mizers are always isolated, although not necessarily unique, whereas local minimizers 
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may constitute a continuum of unstable equilibria. Hence, our analysis will shed light 
on fundamental properties, virtues, and limitations, of regularization by means of the 
Mumford-Shah functional MS, and provide a rigorous justification of the numerical 
results appearing in the literature. 



It is useful to show how the discrete functional (|10j) can be still expressed in terms 
of the sole derivatives for general K. As done before in the case K = I, and with 
the now usual identification u = (D^u, c), we can rewrite the functional in terms of 
derivatives and mean value as follows: 



J^jj;(z,c) = h 2 a\\KDlz-{g-Kc)\\l + h 2 Ymml\z itj \ 2 ^\, (11) 



P 

Note that in general we cannot anymore split orthogonally the discrepancy \\KD^ h z — 
(g — Kc) ||| into a sum of two terms which depend only on derivatives z and mean 
value c respectively. Nevertheless, for fixed z, it is straightforward to show that c = 
arg min c Jr^j^ {%■> c ) depends on z via an affine map. Indeed we can compute 



' {Kl,g-KD\z 
\\K1\\% 



1, 



where 1 is the constant vector with entries identically 1. Here we assume that 1 £ 
kerK, that is a necessary condition in order to be able to identify the mean value 
of minimizers (a similar condition is required anytime we deal with regularization 
functionals which depend on the sole derivatives, see, e.g., [191 EH])- By substituting 
this expression for c into (|lip , it is clear that the minimization of functionals (|10p can 
be reformulated, in terms of the sole derivatives, as constrained minimization problems 
of the form ©. 

2 Existence of minimizers for a class of discrete free- 
discontinuity problems 

In light of the observations above, we can transform the problem of the minimization of 
functionals of the type ([9]) , by means of discretization first and then reduction to sole 
derivatives, into the (possibly, but not necessarily) constrained minimization problem: 

f Minimize J r (u) = [\\Tu - gf^ + ^Zi min {N', r 2 } ] • _ 
\ subject to Qu = 0. 

Our first result ensures the existence of minimizers for the constrained optimization 
problem f|12|) : 

Proposition 2.1. Assume r > 0, and fix linear operators T : W N — ► M M and Q : 

1^ — > M A/ , which are identified in the following with their matrices with respect to the 
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canonical bases. We also fix g G M Af . The constrained minimization problem 

{Minimize J r {u) = [\\Tu - g\\^ + min {Wi\ 2 , r 2 } ] ^ 

1 subject to Qu = 0. 

has minimizers u* . 

Proof. We begin by noting that infg u= o J r (u) is well-defined and finite, since J r > 
is bounded from below. It remains to show that there exists a vector u* that satisfies 
J r (u* ) = inf ugR iv J r (u) . Towards this goal, consider the following partition V = 
{Uij} 2 =i of m N indexed by the subsets Xj of the index set 1 = {1, 2, N}, as follows: 

U %i := {u G M. N : \ Ui \ <r,ie Tj, \ty\ > r, i G T/Ij}. (14) 

The minimization of J r subject to Qu = and constrained to the closure of the subset 
U%. can be reformulated as a quadratic optimization problem, for which the classical 
Frank- Wolfe theorem [6] guarantees the existence of a minimizer uiXj). Now, since 
M = UjXj, the minimal value of J r subject to Qu = and over all of R N is just the 
minimal value from the finite set {J r (u(Ij)) : j = 1, . . . , 2^}; that is, 



min Jr(u) = min J r (u(Tj)) 
Qu=0 XjCJ 

and u* = arg ming u=0 J r (u) = u( arg minj jC j J r (u(Ij))) . □ 

In fact, Proposition 12,11 extends to a much larger class of free-discontinuity type 
minimization problems; by the same reasoning as before, we arrive at the more general 
result: 

Theorem 2.2. The constrained minimization problem 

(Minimize J?{u) = [\\Tu - g\\ 2 £ M + YliLi min{|uj| p , r p } ] 
1 subject to Qu = 0. 

has minimizers u* for any real-valued parameter p > 1. 

The Frank- Wolfe theorem, which guarantees the existence of minimizers for quadratic 
programs with bounded objective function, does not apply to the general case p > 1 
where the objective function Jr is not necessarily quadratic. Nevertheless, with the 
following generalization for the Frank- Wolfe theorem, Theorem 12.21 follows directly 
from a similar argument as for Proposition 12.11 



Proposition 2.3. Suppose A is an N x N positive semidefinite matrix, and suppose b 
and c are N x 1 vectors. Suppose also that X is a nonempty convex polyhedral subset 
of~R N . The convex optimization problem 

minimize u 1 Au + b l u + Yli<j<N c j\ u j\ P 
subject to u G X. 

admits minimizers for any real parameter p > 1, as long as the objective function is 
bounded from below. 
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For ease of presentation, we reserve the proof of Proposition 12.31 to the Appendix. 
From the proof of Theorem 12.21 one could in principle obtain a minimizer for Jf 
by computing a minimizer u(Zj) for each subset Xj C T using a quadratic program 
solver [6j, and then minimizing Jf over the finite set of points {u(Zj)}. Unfortunately, 
this algorithm is computationally infeasible as the number of subsets of the index 
set {1,2, N} grows exponentially with the dimension N of the underlying space. 
Indeed, the minimization problem (|15p is NP-hard, as the known NP-complete problem 
SUBSET-SUM can be reduced to this problem. A complete discussion about the NP- 
hardness of (|15j) can be found in [2]. 

3 An iterative thresholding algorithm for 1-D free-discontinuity 
inverse problems 

3.1 Overview of the algorithm 

In this section, we introduce an algorithm that is guaranteed to converge to a local 
minimizer of the real- valued functional Jf : £2(2) — ► R having the form 

J?(u) = \\Tu- g\\l {K) +J2™n{\ui\ P ,r p }, (17) 

subject to the conditions: 

• X and /C are countable sets of indices, and T : £2(1) — ► i%(1C) is a bounded linear 
operator, which is in the following identified with its matrix associated to the 
canonical basis; 

• the operator T has spectral norm [|T|| < 1. Note that this requirement is easily 
met by an appropriate scaling for the functional, i.e., we may have to consider 
instead 

J?(u) = 7 ||rn- 5 ||, 2 2(/c) +7^min{|n,r,rP}, 7 < 1. 

iei 

This modification leads to minor changes in the analysis that follows (see also 
Subsection [62]), and throughout this paper we assume, without loss of generality, 
that 7 = 1; 

• the parameter p is in the range 1 < p < 2. In case the index set I is finite, only 
the restriction p > 1 is necessary. 

We note that the scaled ID discrete Mumford-Shah functional \ J\j y /y l is clearly a 
functional of the form (|17p having r = 1/yfh, index set I = {1, . . . , |_ r2 J}) parameter 
p = 2, and operator T = D[/ r 2 ■ M^ 2 ]- 1 ^ RL r2 J . As shown in the Appendix, the 

operators D\/ r 2 satisfy the uniform bound 2 || ^ 1/2, independent of dimension, 
so a scaling factor is not needed in this case. 
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In the following, we will not minimize Jr directly. Instead, we propose a majorization- 
minimization algorithm for finding solutions to Jr, motivated by the recent application 
of such algorithms for minimizing energy functionals arising in sparse signal recovery 
and image denoising [9~], 121 j . More precisely, consider the following surrogate objective 
function, 

jP' surr (u,a) := J r p (u) - \\Tu - Ta\\\ {lc) + [|u - a\\\ {J) . u,ae 1 2 {T). (18) 

The surrogate functional J^' surr satisfies Jr ,surr (u, a) > Jf(u) everywhere, with equal- 
ity if and only if u = a, and is such that the sequence 

u n+1 = arg min JP' surr (u,u n ) (19) 

u 

obtained by successive minimizations of Jr' surr (u,a) in u for fixed a results in a non- 
increasing sequence of the original functional Jr(u n ) (see Lemmas 13.11 and I3.2p . We 
will study the implementation and the convergence properties of the iteration (|19p as 
follows: 

• in Section 3.2, we review the standard properties of majorization-minimization 
iterations, 

• in Section 3.3, we explicitly compute u-global minimizers of the surrogate func- 
tional Jr' surr (u, a), for a fixed; 

• in Section 3.4 we discuss a connection between the resulting thresholding func- 
tions and thresholding functions used in sparse recovery, 

• in Sections 3.5, 3.6, and 3.7, we show that the sequence (u n ) ng N defined by (fT9j) 
will converge to a stationary value u = argmin u Jr' surr (u, u), starting from any 
initial value u° for which Jr{u®) < oo, 

• in Section 3.8, we show that such stationary values u are also local minimizers 
of the original functional Jr that satisfy a certain fixed point condition, and 

• in Section 3.9, it is shown that any global minimizer of J$ is among the set of 
possible fixed points u of the iteration (|19|) . 

By means of the thresholding algorithm, we also show that global minimizers of the 
functional Jf are isolated, and moreover possess a certain segmentation property that 
is also shared by fixed points of the algorithm. 

3.2 Preliminary lemmas 

The lemmas in this section are standard when using surrogate functionals (see [21] and 
[9]), and concern general real- valued surrogate functionals of the form 

T surr (u, a) = T{u) - \\Tu - Ta\\\ {K) + \\u - a||| (J) . (20) 

The lemmas in this section hold independent of the specific form of the functional 
T : £ 2 (T) -> M+, but do rely on the restriction that ||T|| < 1. 
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Lemma 3.1. // the real-valued Junctionals J-(u) and J- surr (u,a) satisfy the relation 
(|2T)j) and the sequence (u n ) n ^ defined by u n+1 = argmin^g^j) JF s ™" r (u, u") is initial- 
ized in such a way that ^(u ) < oo, then the sequences !F{u n ) and F surr (u n+ ,u n ) are 
non-increasing as long as \\T\\ < 1. 



Proof. Since ||T|| < 1, also ||T*T|| < 1, and so the operator L = \fl — T*T is a 
well-defined positive operator whose spectrum is contained within a closed interval 
[c, 1] that is bounded away from zero c > 0. We can then rewrite J rsurr (u n+l u n ) as 
F surr (u n+1 ,u n ) = F(u n+1 ) + ||L(-u n+1 - u n )\\ 2 l2(iy from which it follows that 

Hu n+l ) < Hu n+1 ) + \\L(u n+1 -u n )\\l (I) 
= F surr (u n+1 ,u n ) 

< T surr \u n ,u n ) 

= Hu n ) 

< T{u n ) + \\L(u n -u n ~ l )\\l {x) 

= F surr {u n y u n - l ) y (21) 
where the second inequality follows from u n+1 being a minimizer oi J- surr (u,u n ). □ 
From Lemma 13,11 we obtain the following corollary: 

Lemma 3.2. As long as the conditions of Lemma \ 3.1\ are satisfied, one can choose 
N 6 N sufficiently large such that for all n> N, ||u n+1 — « n ||^ 2 (2:) < e, i.e., 

lim \\u n+1 -u n \\ HT) =0. 



n^oo 



Proof. From Lemma \'6.1\ it follows that J-(u n ) > is a nonincreasing sequence, there- 
fore it converges, and J-(u n ) — J-{u n+1 ) — ► for n — ► oo. The lemma follows from (|2ip . 
and the estimates 

Hu n ) - Hu n+l ) > \\L(u n+1 - u n )\\l (I) > (1 - ||T|| 2 )||n" +1 - u n \\l { xy 

□ 

3.3 The surrogate functional J^' surr , its explicit minimization, and a 
new thresholding operator 

It is not immediately clear that the surrogate functional Jr ,SUTT in (|18p is any easier 
to manage than its parent functional Jr ■ However, expanding the squared terms on 
the right hand side of (fT8|) . Jf ,surr (u,a) can be equivalently expressed as 

J?> surr (u,a) = \\u-(I-T*T)a + T*g\\ 2 e2{I) +Y,^H\ui\ p ,r p } + C 

iei 

1 +C, 



[(iH ~[a- T*Ta + T*g]i) 2 + min{M* r?} 
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where the term C = C(T, a, g) depends only on T, a and g. Indeed, unlike the original 
functional ', the surrogate functional J P ' SUTT decouples in the variables Ui, due to the 
cancellation of terms involving ||T-u||^ 2 . Because of this decoupling, global ■u-minimizers 
of Jr ,surr {u, a), for a fixed, can be computed component-wise according to 



Uj = are mm 



(t-[a-T*Ta + T*g]i) 2 + mm{\t\ p ,r p } , iel. (22) 



One can solve (|22p explicitly when e.g. p = 2, p = 3/2, and p = 1; in the general case 
p > 1, we have the following result: 

Proposition 3.3 (Minimizers of Jr' r (u, a) for a fixed). . 

1. If p > 1, the minimization problem u = argmin ug ^ 2 (j) Jf' su " *(u, a) can be solved 
component-wise by 

u i = H M ([a-T*Ta + T*g] i ), iel, (23) 
where H( p ^) : K ^ M is the 'thresholding function', 

rr m _ J ^(A), \\\<\'(r, P ) 

H mW-[ x * |A|>A'(r,„). (24) 

Here, F p ' 1 (X) is the inverse of the function F p (t) = t + |sgnt|t| p_1 , and X' := 
X'(r,p) £ (r, r + |r p_1 ) is the unique positive value at which 

(F- 1 (X l )-X') 2 + \F-\X')\'> = r p . (25) 

2. When p = 1, the general form (|23p still holds, but we have to consider two cases: 

(a) If r > 1/4, the thresholding function Hn r ) : IK ^ M satisfies 

{0, |A| < 1/2 

(|A| -l/2)sgnA, 1/2 < |A| < r + 1/4 = A'(r, 1) (26) 
A, |A| > r + 1/4 

(b) If, on the other hand, r < 1/4, the function H(i )r ) satisfies 

n m _ J 0, |A|<Vr = A'(r,l) m , 
H (l , r) (X) - | A) {M>V¥ (27) 

In all cases, the function H( P:T ) * s continuous except at X'(r,p), where i/( p . r ) has a 
jump-discontinuity of size 5(r,p) = \X' — H( p ^(X')\ > if r > 0. In particular, it holds 
that X'(r,p) > r while H( p ^(X') < r. 

We leave the proof of Proposition 13.31 to the Appendix. 
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Remark 1. In the particular case p = 2 corresponding to classical Mumford-Shah 
regularization (|12|) . the thresholding function H( 2 . r ) '■ K — ► M has a particularly simple 
explicit form: 

A/2, |A| < \/2r 



In addition to H^r) an d ^(l,r)> the thresholding operator iT(3/ 2)7 .)(A) corresponding to 
p = 3/2 can also be computed explicitly, by solving for the positive root of a suitable 
polynomial of third degree. In Figure [2] below, we plot Hr 2i i),H (3/2,1); an d ^(1,1) with 
parameter r = 1. For general noninteger values of p, H( p ^ cannot be solved in closed 
form. However, recall the following general properties of H^ pr y. 

• -ff( Pi r) is an °dd function, 

• i/( Pjr ,)(0) = 0, and 

• F fer) (A) = A once |A| > r + fr^ 1 . 

In fact, we can effectively ^recompute H^ p ^ by numerically solving for the value of 
Hf p ^(Xj) on a discrete set {Xj} of points Aj G (0, + r]. At Aj, one just needs to 

solve the real equation 

h i + ^r 1 - a j = ° (29) 

2 J 

which can be computed effortlessly via a root-finding procedure such as Newton's 
method: while hj satisfies (hj — Xj) 2 + (hj) p < r p , set H( p ^(Xj) = hj; once this 
constraint is violated, set H( p ^(Xj) = Xj. 



3.4 Connection to sparse recovery 

When p = 1 and r < 1/4, we know from Theorem 13.31 that the iterative algorithm 

u n+1 = argmin J?> surr (u, u n ) (30) 

u 

reduces to the component-wise thresholding 

< +1 = ^([u n -TW + T* 5 ]i), (31) 

where 

- { °; |a! ; ; < 32 > 

This thresholding function : M — > M is referred to as hard-thresholding in the 
area of sparse recovery, and the iteration (f3Tj) generated by successive applications of 
hard thresholding has been previously studied |9J. In particular, the iteration (f3Tj) was 
shown in (|3ip to correspond to successive minimization in u for fixed a of the surrogate 
functional J-r' surr (u, a) corresponding to the £q regularized functional, 

J^(u) = \\Tu - g\\l 2{K) + r\\u\\ eo{I) . (33) 



17 



(a) 




-3-2-1 1 2 3 



(c) 

Figure 2: The discontinuous thresholding functions #(3/2,1)) an d H( 2j i), with 

parameters p = 1,3/2, and 2, respectively, and r = 1. 
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Here, the £q quasi-norm ||«||^ (x) := Siex \ u i\o is defined component- wise by 



R o 



0, if Ui = 

1, otherwise 



The regularized functional J^(u) is related to the so-called K-sparse problem, 

minimize \\Tu - g\\\ 
subject to ||n||o < K, 

in that there exists a r, that depends on g and K, such that the solution to the -fT-sparse 
problem is the minimizer of the £q regularized functional. The X-sparse problem (|34|) 
is NP-hard in general [2], but under certain restrictions on the matrix T, it is possible 
to solve (|34p using fast algorithms. For example, if the m x N matrix T satisfies 
a certain restricted isometry property of order 2K |15| . and there exists a i'T-sparse 
vector satisfying the constraint Tu = g, then u is the unique solution to (|34p and can 
be recovered as the limit of the following iterative hard thresholding (IHT) [lOj : 

u n+1 = M K (u n -T*Tu n + T*g). (35) 

Here, the thresholding operator M s (u) sets all but the largest (in magnitude) s ele- 
ments of u to zero. This algorithm can be viewed as a variant of the hard thresholding 
algorithm (|3ip with threshold parameter r = r n adaptively adjusted at each iteration 
to remain consistent with the knowledge that a .fT-sparse solution exists. In fact, a 
modified version of IHT, called normalized iterative hard thresholding (NIHT), repre- 
sents the state of the art among a large class of algorithms that have been designed 
to solve the if-sparse problem ()34p under RIP or related assumptions on the matrix 
T [11], see also the paper repository [37J. Preliminary numerical results indicate that 
the performance of NIHT could be strengthened by replacing hard thresholding with 
a hybrid soft-hard thresholding, as shown at the top of Figure [2J as derived in Propo- 
sition 13.31 from the minimization of free-discontinuity functional Jf with parameters 
p = 1 and r > 1/4. 

Because a convergence analysis of the iteration ()3ip corresponding to hard thresh- 
olding has been studied already [9] , we omit the case p = 1 and r < 1/4 in the sequel. 

3.5 Fixation of the discontinuity set 

We prove now that the sequence (u n ) ng N defined by 

u n+1 = aigrnin J r p ' surr (u,u n ) (36) 

U 

or equivalently, according to Proposition 13.31 component- wise by 

< +1 = H M ([u n -T*Tu n + T*g] i ), i € X, (37) 
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will converge, granted that p > 1 and [|T|| < 1. To ease notation, we define the 
operator H : -^(X) — ► ^2(X) by its component- wise action, 

p(t*)]i := H M {[u - T*Tu + (38) 

so that the iteration (|37p can be written more concisely in operator notation as 

u n+1 =M{u n ). (39) 

We omit the dependence of HI on the parameters p, r, and the function g for continuity 
of presentation. At the core of the convergence proof is the fact that the 'discontinuity 
set', indicated below by X™ , of u n must eventually fix during the iteration (|37p . at which 
point the 'free-discontinuity' problem is transformed into a simpler 'fixed-discontinuity' 
problem. 

Lemma 3.4 (Fixation of the index set Z\). Fix p > l,r G R. + , and g G ^{K,). 
Consider the iteration 

u n+l = M(u n ) (40) 
and the time- dependent partition of the index set I into 'small' set 

IS = {iel:\u?\<\'(r,p)} (41) 

and 'large' set 

1? = {iel:\u?\>\'(r,p)} (42) 

where X' (r, p) is the position of the jump discontinuity of the thresholding function, as 
defined in Proposition \3.tft For JVgN sufficiently large, this partition fixes during the 
iteration u n+l = M(u n ); that is, there exists a set Xd such that for all n > N, Xq = Iq 
and X™ = T\ := X \ Zq. 

Proof. By discontinuity of the thresholding operator H( p ^(\), each sequence compo- 
nent 

< = H M ([u n - x - T*Tu n ~ l + T*g]i) (43) 

satisfies 

(a) \uf\ < X'(r,p) — 8{r,p) < X'(r,p), Hie Xq , or 

(b) \uf \ > X'(r,p), if i GXf. 

Thus, - uf\ > 6(r,p) if i G Z£ +1 C\Z?, or vice versa if i G X^nX^ 1 . At the 

same time, Lemma 13.21 implies 

|< +1 -ur|<|K +1 -ul, 2(J) <e, (44) 

once n > N(e), and e > can be taken arbitrarily small. In particular, (|44p implies 
that Xd and Z\ must be fixed once n > N(e) and e < 5(r,p). □ 
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After fixation of the index set I = {iGl: |u™| < \'(r,p)}, M(u n ) = Uj (u n ) and U Xo 
is an operator having component-wise action, for p > 1, 

~ { [[l-T*T)u + T*g)., ifieZi (45) 

Here, as in Proposition 13.31 the function F^" 1 is the inverse of the function F p (t) = 
t + \ sgnili^" 1 . Again, for ease of presentation, we omit the dependence of Uj on the 
parameters p, r, and g. For p = 1 the description is similar, and in general, one easily 
verifies the equivalence 

Ux (f;) = arg min Jj' surr (u, v) (46) 
where j^ urr j s a surrogate for the convex functional, 

Jl(u):=\\Tu-g\\ 2 MIC) + J2Wi\ P - (47) 

That is, fixation of the index set Iq implies that the sequence (it n ) ne pj has become 
constrained to a subset of £2 (I) on which the map EI agrees with a map Uj , associated 
to the convex functional Jj . As we will see, this implies that the nonconvex functional 
Jr behaves locally like a convex functional in neighborhoods of fixed points u = M(u), 
including the global minimizers of Jr ■ 

3.6 On the nonexpansiveness and convergence for T inject ive 

Given that M(u n ) = Uj (u n ) after a finite number of iterations, we can use well- 
known tools from convex analysis to prove that the sequence (u n ) n ^ converges. If 
the operator T*T : £2(1) ^(^) is invertible, or, equivalently, if the operator T 
maps onto its range and has a trivial null space - as, for example, does the discrete 
pseudoinverse d\ in the ID Mumford-Shah approximation - then the mapping Vj has 
the nice property of being a contraction mapping, so that a direct application of the 
Banach fixed point theorem ensures exponential convergence of the sequence (■u n ) ne pj 
after fixation of the index sets. 

Theorem 3.5. Suppose T : £2(1) —> £2(1^) maps onto £2^) and has a trivial null 
space. Let 5 > be a lower bound on the spectrum ofT*T. Then the sequence 

u n+1 = U(u n ), (48) 

as defined in (j38f) . is guaranteed to converge in norm. In particular, after a finite 
number of iterations N £ N, this mapping takes the form 

u N+m =Vf Q (u N ), meN\{0}, (49) 

and the sequence (u n ) n ^ converges to the unique fixed point u of the map Uj . More- 
over, after fixation of of the index set To, the rate of convergence becomes exponential: 

\W N+m - u\\ HI) = \\U? (u N ) - U? (u)\\mD < (1 " 6 r\W N ~ u\\ i2(T) , m G N \ {0}. 

(50) 

The proof of Theorem 13.51 is deferred to the Appendix. 
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3.7 Convergence for general operators T 

Unfortunately, if T*T is not invertible (that is, if 5 = belongs to its nonneg- 
ative spectrum), then the map Uj is not necessarily a contraction, and we can 
no longer apply the Banach fixed point theorem to prove convergence of the se- 
quence (n n ) n6 N. However, as long as ||T|| < 1, we observe by following the proof 
of Theorem (|3.5p that Uj is still non- expansive, meaning that for all v,v' £ £2(2), 
l|Uj (u) — Ui (v / )||^ 2 (x) < \\v — v'\\i 2 (x)- The following Opial's theorem [31], here re- 
ported adjusted to our notations and context, gives sufficient conditions under which 
non-expansive maps admit convergent successive iterations: 

Theorem 3.6 (Opial's Theorem). Let the mapping A from £2^) to £2(1) satisfy the 
following conditions: 

1. A is asymptotically regular: for all v € £2(1), ||A n+1 (t>) — A n (v)\\£ 2 (j) ~~ > for 



3. the set Fix(A) of the fixed points of A in £2^) is not empty. 
Then, for all v S £2^), the sequence (A n (v)) n £^ converges weakly to a fixed point in 



In fact, we already know that Uj is asymptotically regular, in addition to being non- 
expansive - this follows by application of Lemma 13. II and Lemma 13.21 to the functional 
J7j - Thus, in order to apply Opial's theorem, it remains only to show that Ui has a 
fixed point; that is, that there exists a point u E £2^) for which 



The following lemma gives a simple yet useful characterization of points satisfying the 
fixed point relation (|5ip : 

Lemma 3.7. Suppose p > 1. A vector u £ £2(1) satisfies the fixed point relation 
u = Uj (u) if and only if 



n 



00; 



2. A is non-expansive: for all v,v' £ £2(1), || A(z;) — A(v')\\t 2 rz\ < \\v — f'||^ 2 m; 



Fix(A). 




(51) 





Alternatively, if p = 1 and r > 1/4, u = Ux (it) is satisfied if and only if 





where in (J53j) , the index set 2q is split into 
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• Jq = {i G Jo : \ui\ < 1/2}, and 

• l^ = {iel : 1/2 < \ui\ <r + 1/4}. 

Again, recall the notation F p (t) = t + | sgnt|i| p_1 , and observe that the fixed point 
relation (|52p has a very simple expression when p = 2. The proof of Lemma 13.71 is 
given in the Appendix. 

The fixed point characterization of Lemma 13.71 will be crucial in the following the- 
orem that ensures the existence of a fixed point u = Ux (n). We remind the reader 
that until now, all of the results of Section 1.3 remain valid in the infinite-dimensional 
setting |Z| = oo. From this point on, however, certain results will only hold in finite 
dimensions; for clarity, we will account each such situation explicitly. 

Proposition 3.8. Infinite dimensions |Z| < oo, then there exist (global) minimizers 
of the convex functional, 

Ji(u) = \\Tu-g\\l {K) + J2Wi\ P , (54) 

for all p > 1, and any minimizer u of Jj o satisfies the fixed point relation u = Uj (n). 
Restricted to the range 1 < p < 2, the statement is true also in the limit |Z| = oo. 

Proof. In the finite-dimensional setting, minimizers necessarily exist for all p > 1 
according to Proposition 12.31 We now consider the general case. Consider the unique 
decomposition u = uq + u± into a vector no supported on Iq and another u\ supported 
on Zi, i.e., the vectors no G ^f°(Z) := {u G : n« = 0, i & 1±} and u\ G ^f x (Z) : = 

{n £ £2(1) '• 1H = 0, i G To}. Let V : n — > ui and "P -1 = Z — "P : n — > no denote the 
orthogonal projections onto the subspaces ^f 1 ^) an d ^f°(Z), respectively. Consider 
the operators To = TV 1 " and Zi = TV; note that clearly Z = To + Zi is satisfied. The 
functional (|54|) can be re-written with this decomposition according to 

j| o (n + m) = ||Z no + Zim - o||| (/c) + IKH^ 0(X) (55) 

where INI^o^ := (Siex * s the £ p -norm on vectors supported on Zo. 

Let V\ be the orthogonal projection onto the range of Zi in l<i(]0) (not to be confused 
with V, which operates on the space ^(Z)) and let = Z — V\ be the orthogonal 
projection in onto the orthogonal complement of the range of T\. Then, fix- 

ing no G £^-(Z), the vector V\{g — TqUq) G range(Zi) C is the solution to the 

minimization problem 

Vi(g-T u ) = arg min \\v - (g - T uo)\\j (lc) , (56) 

t)erange(Ti) v ' 
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so that minimizers of the functional T : 4°( J ) M+ defined by 

^(«) = \\T Q v + Vi{ g -T v)- g \\l {K) + \\v\\^ 0{x) 

= ll^-yllw + ll^ll^o (x) (57) 

with K : = To, and y := 5, will yield minimizers of J% . Functionals of the form 
()57p were studied in |21j : there, it is shown that as long as 1 < p < 2, T has minimizers, 
and any minimizer v can be characterized by the fixed point relation 

= F- 1 ([(I-K*K)v + K*y] i ), i G T ; (58) 

(recall that F~ l is the inverse of the function F p (t) = t + | sgnt|t| p_1 ). 
In the finite-dimensional setting |T| < 00, the Euler-Lagrange equations corresponding 
to minimizers of the convex functional T as in (|57[) imply the same fixed point relation 
([5HD also, for all p > 1. 

By Lemma l37T| the characterization ([58]) is equivalent to the condition 



p > 1: 
p=l: 



K*(y-Kv) 
K*{y-Kv) 



[^(y-iT^j.^lsgn^l^r 1 , (59) 



'ho 72 ' 1 - 721 ' ^fti-^L , ^ (60) 

= l/2sgnuj, if 1/2 < < r + 1/4. v y 



Making the identification no = v and Tini = Pi (3 — To^), and rewriting = "P^To, 
and y = V^g, the relations (|59|) and (|60p imply the full fixed point characterization in 
Lemma 13.71 □ 

Remark 2. The restriction p < 2 that is necessary for the results of this paper in the 
infinite dimensional setting |T| = 00 was only used in the proof of Theorem l3.8l where it 
comes from [21] and is needed there to prove the existence of minimizers of functionals 
T of the form (|57p . If that proof can be extended to functionals of the form (|57p for 
general p > 1, then the restriction p < 2 can be dropped in the current paper. For 
instance, if we additionally require that T is a bounded operator from £ P (I) to £2(1) for 
1 < p < 00 then the existence of minimizers would be guaranteed also for 1 < p < 00 
and |T| = 00. In this case we could consider a minimizing sequence (v k ) of J 7 , which 
is necessarily bounded in £ p . Therefore, there exists a subsequence (v kh ) which weakly 
converges in £ p to a point v* . This also implies the weak convergence of the sequence 
Kv kh in£ 2 ; note that (Kv hh , w) t2Xt2 = (v kh , K*w)i pX e p/ , for 1/p+l/p' = 1. By Fatou's 
lemma we obtain J-(v*) < liminf^ T{v kh ) and v* is a minimizer of T. However, we 
still require that p > 1 for the proof of Proposition 13.31 and for the results of the next 
section to hold. 

Combining the results from this section, we obtain: 
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Theorem 3.9. Suppose 1 < p < 2. Starting from any vP satisfying Jr{u®) < °o, the 
sequence (u n ) n£ N defined by u n+l = H n (n°) as in (|38|) will converge weakly to a vector 
u G £2(2) that satisfies the fixed point condition, 

1. \ui\ > X'(r,p), ifieli = {jel: \v,j\ > r} 

2. \ui\ < F-^X'^p)), forp> I, ifiel = {j el: |%| < r}, and 

3. (a) Ifp>\: 



[T*{g-Tu) 



(b) Ifp= 1 andr> 1/4: 



0, if \m\ > X'(r,p) 

F p (ui)-Ui, if\ui\<X'(r,p)-5(r,p) 



(61) 



T*(g-Tu) 
T*(g-Tu) 
T*(g-Tu) 



.€[-1/2,1/2], \ui\ < 1/2 

. = l/2sgn«j, 1/2 < \ui\ < r — 1/4. 

• = 0, |«*|>r +1/4. 



(62) 



If the index set \I\ < 00 is finite dimensional, the theorem holds for all p > 1. 



Proof. By Lemma 13. 4| the map u n+1 = M(u n ) becomes equivalent to a map of the 
form u n+1 = \Jx (u n ) after a finite number of iterations N G N. By Lemma 13.41 and 
Proposition 13.3} the subset Xq C I separates I in the sense that, for all n > N, 

• \uf\ <F p 1 (X'(r,p)), if ie X , 

• \uf\ > X'(r,p), if i G Ji = J\J - 

That the sequence (u n ) n eN converges to a fixed point of the map Uj follows from 
Opial's theorem applied to the map Ui : 

1. the asymptotic regularity of Uj is a consequence of Lemmas 13.11 and 13 . 21 

2. the nonexpansiveness of Ux follows from the proof of Theorem (|3.5p . and 

3. Theorem 13.81 guarantees that the set of fixed points of Uj in £2(1) is nonempty. 

The limit u of the sequence (u n ) will satisfy the fixed point conditions of Lemma 13.71 
Since weak convergence implies component-wise convergence, it follows for all i G Iq 
that 



lim 1*4*1 

n— >oo 



< X'(r,p) - 5(r,p) 
and the respective lower bound \uf \ > X'(r,p) holds analogously for i G 1\. 



(63) 
□ 
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4 On minimizers of J* 



We are now in a position to explore the relationship between limit vectors u of the 
iterative thresholding algorithm (j38[) and minimizers of the free-discontinuity functional 
Jr (|17p . As a first but important result in this direction, 

Theorem 4.1. A "point u satisfying the fixed point relation of Theorem \3.9\ is a local 
minimizer of the functional Jf defined in (|17|) . 

The proof of Theorem 14.11 is omitted at present but can be found in the Appendix. 
This result should not be surprising, however. Due to the separation of the entries 
of any fixed point u, such that u~i < r < uj for i £ Iq and j £ Xi, we have also 
To = {i G I : \u{\ < r} and X\ = {j € X : \uj\ > r} for all u E B(u,e(r)), where 
B(u~,s(r)) is a ball around an equilibrium point u of radius e(r) > sufficiently small. 
On this neighborhood B(u, e(r)) of u, the functional Jr is convex. Since u is obtained 
as the limit of a sequence (u n ) in B(u,e(r)) for which the sequence Jr(u n ) is nonin- 
creasing, one would expect that u minimizes J?{u n ) within this neighborhood. 

More surprising is that global minimizers of Jr are also fixed points, as shown in 
the following theorem. Even though the existence of such minimizers is only guaran- 
teed in the finite-dimensional setting (see Proposition 12. 3p . the following result is not 
restricted as such. 

Theorem 4.2 (Global minimizers of Jr are fixed points u = M(u)). Any global mini- 
mizer u* of Jr satisfies the fixed point condition of the map M that is given in Theorem 

EE 

The proof of Theorem 14.21 is rather long and we defer it to the Appendix. We 
reiterate once more that on a ball B(u,e(r)) around an equilibrium point u of radius 
e(r) > sufficiently small, the functional Jr is convex; following the proof of Theorem 
14.21 we see that J r p is in fact strictly convex whenever u = u* is a global minimizer, 
since the restriction of T to the subspace (^(T) C l<z(T) of vectors with support in T\ 
must be an injective operator in this case. Hence a global minimizer is necessarily an 
isolated minimizer, whereas we cannot ensure the same property for local minimizers if 
T has a nontrivial null-space; in this case, local minimizers may form continuous sets, 
as it is shown in the bottom-right box of Figure [3l We conclude the following remark. 

Corollary 4.3. Minimizers of Jr are isolated. 

5 2-D free-discontinuity inverse problems and a projected 
gradient method 

As presented in Subsection 11.4.21 the minimization of the discrete functionals for 2-D 
free-discontinuity inverse problems has the general form 

| Minimize J?(u) := \\Tu - ff||f 2(jC) + Eiex min {\ u i\ p , r ' p } ^ 
\ subject to Qu = 0, 
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where Q : £2 (I) — ► izQC') is a suitable bounded linear operator. 

We can not directly generalize the analysis of the previous sections to (|64p . as the 
introduction of surrogate functionals does not decouple the constraint Qu = 0. How- 
ever, when the index set I is finite dimensional, we can still say something. For ease 
of presentation, we will assume p = 2 throughout this section. 

First, recall that the partition argument of Theorem l2.2l guarantees that the constrained 
minimization problem (|64p has a minimizer. Again, one could in theory obtain such 
a minimizer by computing a minimizer u(Xq) for each subset lo C I = {1,2, ...,N}. 
Of course, such an algorithm is computationally infeasible as the number of subsets of 
the index set {1,2,..., N} grows exponentially with the dimension N of the underlying 
space. 

We propose instead the following more practical projected gradient algorithm: for 
any initial u°, iterate 

u n+1 = V kcT(Q) [H %r) (u n + T*(g- Tu n ))] , (65) 

where 7\er(Q) is the orthogonal projection onto the null-space of Q. This projection 
can be easily computed explicitly by 

T^kerS = I-Q)Q 

= I-Q*(,QQ*)- X Q, 

where the latter equality holds whenever Q is a full-rank matrix, as the one associated 
to the Schwartz conditions ©. The analysis of the algorithm ()65[) is beyond the scope 
of this paper; nevertheless, note that locally around any minimizer, the functional is 
convex, and that projected gradient iterations are well-known methods for constrained 
minimization of (non-smooth) convex functionals, see for instance pQ. 

6 Numerical Experiments 

6.1 Dynamical systems, stability, and equilibria 

Iterative thresholding algorithms have a natural interpretation as discrete-time dy- 
namical systems with nonsmooth right-hand-side, and can be associated to continuous 
dynamical systems of the type: 

u(t) = F(u(t),t) 

= T(H (Ptr) (u(t)+T*(g-Tu(t)))- U (t)) , t>t , r>0. 

The study of the existence, uniqueness, stability, and long-time behavior of these ODE's 
is of fundamental interest in order to clarify also the stability properties of iterative 
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thresholding algorithms. Indeed, other than soft-thresholding iterations |21| . the cor- 
responding right-hand-side is not Lipschitz continuous and can even be discontinuous, 
as is the case for free-discontinuity problems. In [14\ [23] conditions are established 
for the existence, uniqueness, and continuous dependence on the initial data (at finite 
time) of solutions of dynamical systems with discontinuous right-hand-side. However, 
very little is known about long-time properties of such dynamical systems and about 
the nature of their equilibrium points. 

For several continuous thresholding functions, such as the ones introduced in [21\ [27J 
126] . one can easily show, for instance by means of T-convergence arguments, that equi- 
librium points depend continuously on the parameters of the thresholding, see, e.g., 
[26} Theorem 5.1]. Nevertheless, for discontinuous thresholding functions H^ p ^ such as 
those studied in this paper, sudden bifurcation phenomena and instabilities do appear 
in general. Figure [3] shows that multiple equilibrium points can exist for these thresh- 
olding operators and their number may depend discontinuously on the thresholding 
shape parameters. Moreover, as established in Theorem 14.21 global minimizers of Jr 
are always stable equilibria and isolated points, while local minimizers can be unstable 
equilibria and form a continuous set, as shown in the bottom-right box of Figure O 

6.2 Denoising and segmentation of 1-D signals and digital images 

In this subsection, we are concerned with numerical experiments in the use of an 
iterative thresholding algorithm for the minimization of 

N 

J^(u) := \\D\u - g\\\ + 7 Y, min{u 2 , r 2 }, (66) 

i=l 

modelling problems of denoising and segmentation. 

Note that we introduced an additional regularization parameter 7 > which has 
the sole effect of modifying the thresholding function H^ 2 ^,-y) as follows 



H {2 ,r,y)(z) = { 1+7 ' v / (T^F +1 -^ (67) 

[ z, otherwise. 

This thresholding function can be again easily computed by means of an argument 
similar to the proof of Proposition 13.31 In Figure H] and Figure [6] we show the results 
of applications of the iterative thresholding algorithm ([37]) and the projected gradient 
algorithm (j65[) respectively. In Figure [5] we show a comparison of the use of the 
thresholding ff/ 2 ,ryy) an d tire soft-thresholding S-y (see its definition in (|106p ); the for- 
mer promotes the minimization of the Mumford-Shah constraint MS and piecewise 
smooth solutions, whereas the latter promotes the minimization of a total variation 
constraint [36], which is also well-known to produce (almost) piecewise constant so- 
lutions with a perhaps unwanted 'staircase effect'; see also [191 Section 4] for details. 
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-100 -50 50 100 -60 -40 -20 20 40 60 



Figure 3: We show patterns in M. 2 formed by initial points u° colored according to 
the corresponding equilibria computed as limits of the iterative thresholding algorithm 
(|37p . For invertible 2x2 squared matrices T, the equilibria are isolated and the region 
of initial points for which (|37p converges to a given equilibrium point do partition the 
space into sets which might be disconnected. Structures of the partition generated by 
different matrices T are exemplified in the top boxes and in the bottom-left one. In 
the bottom-right box we show the pattern related to iterations where the 2x2 squared 
matrix T has nontrivial null-space. We can see again that global minimizer are isolated 
and correspond to the points on the axes, whereas local minimizers are continuously 
distributed along an affine space generated by the kernel of T. It is not difficult to 
show that this structure always occurs for such matrices. 
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Original noisy 1D signal 



Computed denoised 1 D signal 
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Comparison of the noisy and the denoised signals 




Discrete derivative of the computed denoised signal 
200 r 
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Figure 4: We show the application of the iterative thresholding algorithm (|37p for the 
classical denoising problem of 1-D signals where K = / in ©, and hence T = D^. The 
thresholding parameters used for the numerics are r = 2.2 and 7 = 0.002. 
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Computed denoised 1 D signal by iterative thresholding Computed denoised 1 D signal by iterative soft-thresholding 
to minimize the Mumford-Shah functional to minimize the total variation of the signal 




50 100 150 200 250 50 100 150 200 250 



Discrete derivative computed by iterative thresholding Discrete derivative computed by iterative soft-thresholding 
to minimize the Mumford-Shah functional to minimize the total variation 

200 i . . . . n 200 1 . . . . n 
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Figure 5: A comparison of the denoising of the signal in Figure H] by means of the al- 
gorithm ([37]) and by iterative soft-thresholding [21] applied to discrete derivatives. We 
can appreciate how the algorithm (|37p promotes piecewise smooth solutions, whereas 
the iterative soft-thresholding promotes the total variation minimization with the in- 
troduction of a 'staircase effect'. The thresholding parameters used for the numerics 
are r = 2.2 and 7 = 0.002 for fl67|), and 7 = 0.002 for the soft-thresholding (T1061) . 
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Figure 6: We show the application of the projected gradient algorithm (|65p for the 
classical denoising problem of digital images where K = I in Q, and hence T = D^ h . 
The thresholding parameters used for the numerics are r = 5 and 7 = 0.005, and the 
image size is 80 x 80. The anisotropic effects of ([7]) are clearly visible, suggesting that 
for more effective image denoising, iterative thresholding on an isotropic (or direction- 
independent) variant of the 2D Mumford-Shah functional should be studied; see |18}ll2j 
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6.3 Inverse problems 

As already mentioned in Subsection ll.4.3l the Mumford-Shah term MS(u) = Jq\ s |V«| 2 + 

(37i d ~ 1 (S u ) is also used for regularizing inverse problems involving operators T which 
are not boundedly invertible. In this section we present two numerical experiments 
on the use of algorithms (|37p and (|65p for ID interpolation (Figure [7]) and for 2D 
inpainting (Figure [8]) respectively. In this case the operator T is a multiplier by a 
characteristic function of a subdomain, i.e., Tu := \D • u, for DC !J; see [23] for other 
numerical examples previously obtained with the Mumford-Shah regularization. 



Interpolation of the signal computed with the iterative 
thresholding to minimize the Mumford-Shah functional 

1 i 1 , , , 




250 



Interpolation of the signal computed with the iterative 
soft-thresholding to minimize the total variation 

1 | , . . 1 n 
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Discrete derivative of the interpolation computed by 
minimizing the Mumford-Shah functional 

50 r 




Discrete derivative of the interpolation computed by 
minimizing the total variation 
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Figure 7: Interpolation of an incomplete signal by means of the Mumford-Shah regu- 
larization and the total variation minimization provided by respective iterative thresh- 
olding algorithms. The red interval is the region where no information on the original 
signal is provided. The thresholding parameters used for the numerics are r = 2.2 and 
7 = 0.002 for ([67D, and 7 = 0.002 for the soft-thresholding (fT06L 



In Figure [7J we show the reconstruction of the noiseless signal of Figure 0] provided 
information only out of the interval [100, 150] which has to be restored. On the left 
boxes we show the results due to algorithm (J37J) and on the left ones the solution 
computed by iterative soft-thresholding. In the former the solution is again piecewise 
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smooth and in the latter a (almost) piecewise constant solution is instead produced. 



Original image with a missing part Derivative of the computed inpainted image 




Computed inpainted image by minimization of the 

Mumford-Shah functional via iterative thresholding Original image without missing part 




Figure 8: Inpainting of a binary image by means of algorithm (|65p . The occluded 
discontinuity is correctly recovered as already observed in [23]. The thresholding pa- 
rameters used for the numerics are r = 8 and 7 = 0.0001, and the image size is 40 x 40. 

In Figure [8] we show the inpainting of a binary image with a missing information 
right at its center which is occluding precisely a discontinuity. As already shown in 
|23j the inpainting process produces minimal length connections of the discontinuity 
set as long as the inpainting region, i.e., the missing part, is not too large. 

7 Appendix 

7.1 Proof of Proposition 12.31 

First, we recall Weierstrass' Theorem, which is used in the proof of Proposition 12.31 
below. 

Theorem 7.1 (Weierstrass' Theorem). The set of minima of a convex function f over 
a subset X C ~ML N is nonempty and compact if X is closed, f is lower semicontinuous 
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over X , and the function f , given by 

f = { f{x) >f x£X > (68) 
[ oo otherwise, 

is coercive, i.e., for every sequence (xk) C X s.t. \\xk\\ — > oo, we have lim^oo f(xk) = 
oo. 

The following two lemmas will be helpful in the proof of Proposition 12.31 
Lemma 7.2. Let F(u) be a convex function defined on W N having the general form 
F(u) = u l Au + b l u + X/i<i<Ar l u il P ' f or some P > 1- Fix x and d in R . If F is 
bounded above and below on the ray {x + td,t > 0} ; then F is constant on the line 
x + td. 



Proof. Let fi(t) = F{x + td), and note that /i is convex because F is convex. Moreover, 
\x has the general form fj,(t) = P{t) + Yli<j<N c j W x j + ^j\\ p wri ere P(t) is a polynomial 
in t of order at most 2. Without loss of generality, suppose < fi(t) < 1 for all values 
of t € M + . Then there exists a sequence of points (t n )neN ; t n — > oo for n — > oo, for 
which /i(t„) is a convergent sequence; let us denote the limit of this sequence by 7. 



1. Case 1: 1 < p < 2. To repeat, 

lim /x(t n ) = lim P(t„) + \J Cj + tn^i || p = 7- (69) 



E 



Since = limn^oo /j,(t n )/t^, it follows that all coefficients in of degree 2 must 
vanish. In turn, then, = lim n _^. 00 /x(t n )/tn> has the implication that for each 
j, one of the coefficients Cj or d,- must vanish as well. Following in the same 
manner, we conclude that all linear coefficients in n{t) also vanish, leaving only 
the possibility that fj,(t) = 7 is a constant function. 

2. Case 2: p > 2: The proof in this case is identical to that of the previous case, 
and as such we leave the details to the reader. 

□ 

Lemma 7.3. Suppose F is a convex function defined on M> N that is bounded from 
below, and has the property that if F is bounded above on a ray {x + td,t £ then 
F is constant on the line x + td. Then if F is constant on the line x + td, F is also 
constant on any parallel line y + td. 

Proof. Let fj,(t) = F{x + td) which by assumption is a constant function fj,(t) = 7, 
and let v(t) = F(y + td). Fix t £ M + , and let z be the point z = x + 2(y — x), i.e. 
y = \x + \z. By convexity of F, we have that 

F(y + td) = F(^z+ i(s + 2td)) < ^F(z) + ^(2t) = a, (70) 

for a constant a. It follows that F is bounded above by a on the ray {y + td,t £ 
from which it follows, by assumption, that F is constant on the line y + td. □ 
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We now prove Proposition 12.31 Choosing xq £ X, we define the (nonempty) set 

M :=Xn{x(E R N , F(x) < F(x )}. (71) 

Obviously, the set M is convex and closed. By assumption, F is bounded from below 
on X and hence on M. Therefore, if M is bounded, then Weierstrass' Theorem yields 
the desired result. 



Thus, we may assume that M is unbounded. Then, the convexity of M implies that 
M contains a ray r = {z + td,t > 0}. Denote by r%,r2, ■■■,rj a set of J rays in M 
corresponding to linearly independent vectors d±, ...,dj, so that any ray in M can be 
expressed as a linear combination of the r\, ...,rj. By definition of M and by the as- 
sumption, F is bounded on M, hence, F is constant on each of the the lines Zj + tdj, 
according to Lemma (j7.2|) . From Lemma (|7.3p . it follows that F is constant along 
each line x + tdj for arbitrary x G R , from which we deduce that F is constant along 
any line x + td for arbitrary d £ Y = spanjdi, ....,dj}. Thus, we project X onto the 
subspace of W N that is orthogonal to Y; call this subspace X. 



From the foregoing arguments, we have 

m£F(u) = 'm£F(u) (72) 
x x 

As X is still a convex polyhedral set, and by construction M = XPi{x £ M. N } contains 
no rays, Weierstrass' Theorem yields the desired result. 



7.2 On uniform boundedness of \\D 



The aim of the second part of the appendix is to prove the uniform bound \\D*. \\ < 1/2 
eluded to in Section 3.1. Again, \\A\\ denotes the spectral norm of the matrix A, and 
D\ : W 1 - 1 -> R n is the pseudo -inverse of the discrete derivative matrix as given by 
dS}, with the identification n = \ l/h\. From the expression for Dh, and the knowledge 
that DfrD^ = I is the identity operator and D h Dh = (D^D^)* is self-adjoint, the 
n x (n — 1) matrix D^ h is identified as follows: 



( -(n-1) 
1 
1 



-(n-2) 
-(n-2) 
2 



-(n-3) 
-(n-3) 
-(n-3) 



V 



1 



n 



1 \ 

-1 

-1 
"I / 



(73) 



It is well-known that the spectral norm of an m x n matrix can be bounded by the 
more manageable entry-wise Frobenius norm, according to 



\A\\ < \\A\ 



m n 
i=l j=l 



'■J 



(74) 
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As such, we need only to bound the sum of the squares of the entries of D\ . The sum 
S\ = Y^jZi \di j | 2 over entries in the first row of D^ h is given by S\ = (n — l)(2n — 
l)/(6n ), using the familiar formula J2f=i J 2 = ^N{N+1)(2N+1). The analo gous sum 
over entries in the j th row of d\ is seen inductively to satisfy S° n = S^— + ^%r^ • 
The total sum S n = Y^j=i ^ s then S n = | — — ^ , and we arrive at the desired uniform 
bound: 

Pill < VK< -4 < 1/2. (75) 



7.3 Proof of Proposition 13.31 

In order to help the reading of the current proof, as well as the proofs of Theorem 13.91 
and Theorem l4.2l in later appendices, we report in Table 1 the notation of the functions 
used in the proof of Proposition 13.31 for the definition of -ff(p, r ) ■ 



L P (t, A) 


= (t- X)' 2 + mm{\t\ p ,rP} 


G p (t,X) 


= (t-X)' 2 + \t\P 


F P (t) 


= t + | sgntltl^- 1 , p > 1 


S P (X) 


= G p (F~ i (X), A) = (F-t(X) - A) 2 + \F~ l (X)\p, p > 1 




= argmint>o L p (t, A) for general A > 0, p > 1 




= argmhio<£< r G p (t, A) = F~ 1 (A) for < A < r 




= | F-\X), if G p ( V(A), A) < rv for A > r 
\ A, else 



Table 1: Notation of the functions involved in the definition of Ff( Pt r) as m the proof 



of Proposition 13.3 



Consider the functions 



and 



L p (t,X) = (t-X) 2 + mm{\t\ p ,r p }, 



G p (t,X) = (t-X) 2 + \t\P. 



The proof reduces to solving for 

H(p >r )(X) = argmmLp(t, A) 



(76) 
(77) 

(78) 



as a function of A E R. Since L p (t, X) = L p (—t, —A), the function H^ p ^(X) will be odd, 
and since also H^ p r ^(0) = 0, we can, without loss of generality, restrict the domain of 
interest to A > 0. On this domain, H( p ^(X) = argmin tS R L p (t, A) is nonnegative, since 
L p (t,X) < L p (—t,X) when t > and A > 0. Hence, we can restrict the minimization 
of L p (t,X) to t > 0. 



It will be convenient to split the proof into two cases: 1 < p and p = 1. 
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1. We first analyze the case 1 < p. 
Note that 

argmin L p (t, A) = argmin(t — A) 2 

t>r t>r 

= max{A,r}, (79) 
so that the minimization (|78p naturally splits into the following two cases: 

(a) If A < r, the minimizer has to be searched in [0, r], hence 

fl(p, r )(A) = argmin G p (t, A) = F p -1 (A) < A (80) 

where F~ 1 (X) is the functional inverse of the increasing, and continuous 
function 

F p (t)=t + ^ S gnt\t\ p -\ (81) 

(b) On the other hand, if A > r, the minimizer has to be searched in [0, A], 
hence 

f F-\X), if Gp(F~ 1 (\),X) < r p 
^W A) " \ A^ else 

By implicit differentiation of the functional relation F p (F p ~ 1 (X)) = A, it is clear 
that the functions F~ 1 (X) and S P (X) := G P (F~ 1 (X), A) are strictly increasing 
functions in A. Indeed, we have the bounds 

< ^(A) = (F^F-HX)))- 1 = (l + P -^^{F-\X))^Y < 1, 



d S P (X) = Ig p (F p \X),X)^-F p 1 (X) + ^-G p (F p \X),X) 



and 
d 

^X"fw d t "^p v '- n "-' d\~ p v ' v ' 9A 

= (2(F- 1 (A) - A) +p ( F - 1 (X)y- 1 )±F-\X) - 2(F p 1 (X) - A) 

= 2(l- ^F-\X^j (A - F-\X)) +p -±F- 1 (X)(F- 1 (X)y- 1 > 0, 
since < ^-^(A) < 1, and 

0<F- 1 (A)<A. (82) 

Also observe that Ff x (r + § r p ~ x ) = r, and S p {r + \r p ~ x ) = r p + ^r 2p ~ 2 > r p . 
This leads us to immediately conclude that 

(i) If A < r, then H {p>r) (X) = Ff 1 ^) (from (|80|)). 
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(ii) If A > r + frP" 1 , then S p {\) = G p (F~ l (\), A) > r p , so that H {pjr) (X) = A. 

(in) Since S p (r) < r p while S p (r + |r p_1 )) > r p , the intermediate value theorem 
implies that there exists a unique value X'(r,p) lying strictly within the 
interval (r, r p_1 (| + r 2_p )) at which 



S P (X>) = (83) 



and 



^,r)(A) - | / A>A'(r,p) ' ( } 

At A', H(p r \(X') = argmin 4 >o L p (t, A') is not uniquely defined and is realized 
at F~ 1 (X') and at A'. In this case, we identify H( p ^(X') = F~ 1 (X) for the 
sequel; as will be made clear, this will not cause problems in the ensuing 
analysis. Finally, note that 

(iv) At A', the function -ff( p , r ) nas a discontinuity 5(r,p) = X' — H(p r \(X') that 
is strictly positive, as long as r > 0. Indeed, on the one hand, we know 
that X'(r,p) > r, on the other hand, Hr Pir \(X') < r. This follows because 
H M (X') = F-\X'), and 

(F-\X')y < (F-\X') - A') 2 + \F-\XT = S P (X') = H\ 

2. The analysis of the case p = 1 is left to the reader since it follows a similar 
argument as for p > 1. 

7.4 Proof of Theorem 1531 

We assume that the operator T*T : £2(1) —> £2(1) is nonnegative, so that its spectrum 
lies within an interval [S, 1] with 5 > 0, and the operator I — T*T has norm T*T\\ < 
1 — 5. In particular, if T*T is invertible, then the inequality 5 > is strict, and so 
||J-r*T|| <1-S<1. 

We wish to show that the map Uj with component- wise action 

[Ux » n]i " \ (ll-T*T)u + T*g)., Hi six (§5) 

is a contraction. To this end, let v,v' be arbitrary vectors in t<i(I). 

1. If the index i £ Z\, then 

|[U2b(«)]i - [Vz (v%\ = \[(I-T*T)(v-v%\; 

2. If the index % £ Xq, then we split the analysis in two cases p > 1 and p = 1: 
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(a) for p > 1, we have 
\[VT (v)h ~ [Vx (v%\ = \F-\[(I - T*T)v + T*g] t ) - F p \[(I - T*T)v' + T*g]i) 

A. Fp ^(0[(I-T*T)(v-v% 

< \[(I-T*T)(v-v%\ (86) 

where the second equality is an application of the mean value theorem, 
which is valid since F~ 1 (X) is differentiable. The final inequality above 
follows from implicit differentiation of the relation 

F- 1 (F p (t))=t 

and the observation that \^F p (t)\ > 1 (see the proof of Proposition I3.3() : 

(b) for p = 1, by analyzing all cases, we get also that 

|[Uxo(«)]i-[Uxo(^)kl < \[(I-T*T)(v-v%\ (87) 

Together, we have 

||U X » - Ux K)lll(x) = E l[ u ^»k - [Vi (v%\ 2 

A |2 



< Y^\((I-T*T)v-v') 

2 



i€I 

-T*T)v-v' 112 



< \\I-T*T\\ 2 \\v-v'\\l {I) 

< (l-*)||u-t/||2 aCr) . (88) 

As Uj is a contraction, we arrive at the stated result by application of the Banach 
Fixed Point Theorem. 

7.5 Proof of Lemma 13.71 

If z G Xi, then Ui = Ui + \ T*(g— Tu)] ., which is satisfied if and only if \T*(g — Tu)] . = 
as stated. It remains to analyze the case i 6 lo, and, again, we split the argument in 
the cases p > 1 and p = 1. 

1. First suppose p > 1. Using the notation A = Ui + [T*(g — Xu)]^ the fixed point 
characterization (|5ip translates to 

F- 1 (A) = n i . 

But of course A = F p {ui) is the unique value at which i^" 1 (A) = Uj, and so this 
implies that 

[T*{g — Tu)] i = F p (ui) — Ui, (89) 

and, by reversing operations, the relation (|89|) in turn implies the fixed point 
condition (loTTl. 



2. The case p = 1, which is similar, is left to the reader. 
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7.6 Proof of Theorem 14.11 

The proof will be much simplified by the following lemma which characterizes vectors 
such as u that satisfy the fixed point relations (|6ip or ([62]): 



Lemma 7.4. If u and v are such that 

J r P' surr (u + v,u)- \\v\\l {T) > JP' surr (u, u) = J p (u), (90) 
then J?{u + v) > J r p {u). 

Proof. For any u and v, the following holds because \\L\\ < 1: 

JP(u + v ) = jP' SUTT {u + v,u)- \\Lv\\l (I) > J r P> surr (u + v,u)- \\v\\l {1) . (91) 

If in addition u and v satisfy (|9U|) . then the desired result is achieved by virtue of the 
equality J^ ,su " (u,u) = J p {u). □ 

Let us show now the proof of Theorem 14.11 By Lemma 17.41 it suffices to show that 
at a fixed point u defined by (|6ip or (|62p . any perturbation 5h £ ^2(1) with norm 
||^||£ 2 (j) < min{[A'(r,p) - r], [r - H M (X')]} will satisfy 

jr urr {u + 6h,u) - jr urr (u,u) > \\6h\\l {X y (92) 

After expanding the left-hand-side above, the inequality ([92]) is seen to be equivalent 
to 

2 Shi[T*(Tu - g)]i + [min{|ui + Shi\ p ,r p } - mm{\ Ul \ p , r p } > 0. (93) 

At this point, it is convenient to consider the summation over i £ Xq and i £ Ii 
separately. 

By Lemma 13.4] the first summand above vanishes over X\ and 

1. if 1 < p, then Y.iax Shi[T* (Tu - g)]i = - J2 ieIo Shi sgn« i ||« i | p_1 ; 

2. ifp= l,then'£ier8h i [T*(Tu-g)] i = -1/2 Shi sgn«i+£ i6X « Shi [T*(Tu- 

g)]i- 

With respect to the second summation, observe from Proposition 13.31 that for all 1 < 
Pi \ui\ > X'(r,p) > r for i £ I\, so that this summation vanishes over X\ for any 
perturbation 5h satisfying the component- wise inequality \Shi\ < X'(r,p) — r. Similarly, 
\ui\ < H^ p ^(X') < r for i £ Xq, so that for any perturbation Sh satisfying component- 
wise \Shi\ < min{[A'(r,p) — r], [r — H^ p ^(X')]}, we have that 

J2[™H\ui + Shi\P,rP}-mm{\ui\P,rP}\ = J2 K + Shi\ p - \ui\ p ■ (94) 
The desired result follows if we can show that 
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1. 1< p < 2: [\v,i + 5hi\ p - \ui\P - ^p^gnu^u^ 1 ] > 0, for all i E 1 

2. p= 1: 

(a) \5hi + Ui\ — \ui\ — <5/ij[sgnuj]] > for all i G 2q, and 

(b) 5hi[T*(Tu - g)]i + \Shi\ > 0, for all i G T§. 

The inequality in 2(6) follows directly from Lemma [3.4t by symmetry, 1 and 2(a) follow 
if, for any u > 0, 

min \f(v) := \u + v\ p — u p — pu p ~ l v\ = min (u + v) p — vP — pu p ~ 1 v > 0. (95) 

v€R v>—u 

When p = 1, the right-hand-side is identically zero and the result holds. When 1 < 
p < 2, differentiating the right-hand-side gives that f(v) has a local minimum at v = 0, 
at which /(0) = 0, and, at the endpoint, f(—u) = (p — l)n p_1 > 0. 

7.7 Proof of Theorem [4721 

Suppose that u* is a minimizer of the functional J7j?. Consider the partition of the 
index set X into Zq = {i £ X : \u*\ < r} and X\ = {i £ 1 : \u*\ > r}, and note that 
< oo, or else |c7^(a*)| would not be finite. As in the proof of Theorem (|3.8j) . 
consider the unique decomposition u* = Uq + u\ into a vector u* Q supported on Iq 
and another u\ supported on T\. Again, let V : u — > u\ and V ± = T — V : u — » no 
denote the orthogonal projections onto the subspaces and t^°{I), respectively, 

and consider the operators To = TV 1 - and T\ = TV. 

By minimality of n*, if we fix Uq, the vector u\ satisfies u\ = argmin Jr\{ z )-, 
where 

J? A (z) := HT^-^-To^H^^ + ^minil^rrP}. (96) 

ieii 

Since all coefficients in u\ have absolute value Kni)!) > r, the vector u\ also minimizes 
the functional 

\\T lZ -(g-T u* )\\l {J) , (97) 

or, else, the vector z* minimizing ([97]) would satisfy J^ x {z*) < J^ x {u\)^ contradicting 
the minimality of u\. In fact, u\ must be the unique vector minimizing (|97p . For, if 
another vector u' also minimized (|97|) . then the operator T± would have a nontrivial 
null space containing the span of some nonzero vector v, so that all vectors in the 
affine space {u\ + tv : t E M} would be minimal solutions for (|97|) . In this case, we 
would have also the freedom of choosing from this affine subspace a vector u' having 
one coefficient u\ satisfying \u[\ < r. But such a vector v! satisfies J^ x {v!) < J^i{u\), 
contradicting the minimality of u\ . 
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It follows that the operator T\ must have trivial null space, and u\ is the unique 
minimal least squares solution to (|97p . well-known to be explicitly given by 

uI = (r 1 *Ti)- 1 27^-r «5), (98) 

so that T\u\ is the unique orthogonal projection of (g — TqUq) onto the range of T\. 
Actually V\ = Ti(T 1 *Ti) _1 T 1 * is the orthogonal projection onto the range of T\, due to 
the non-triviality of the null space of T\. Therefore we have T\u\ = V\(g — TqUq). It 
easily follows that 

TZ(T lU * 1 -(g-T o u* o ))=0, (99) 

or, in other words, 

[T*(g - Tu*)) i = 0, for all ieli. (100) 

Now, on the other hand, by observing that any optimal variable U\ for fixed uq depends 
on no via the relationship u\ = (T*Ti) T*(<? — Tquq), we easily infer that the vector 
Uq minimizes 

J?M = \\vi{T Q v-g)\\l {J) + J2 min{MV p }, (101) 

where V± denotes the orthogonal projection operator onto the orthogonal complement 
of the range of T% . 

Consider the convex functional, 

Hv) ■= ll^^-^lli^ + H^^, (102) 

and note that J^ {u) < F(u), while at the same time j7j? ( M o) = •^ r ( u o) by virtue of 
the fact that \u*\ < r. For p > 1 it follows that Uq is also a minimizer of J-(u), and 
so satisfies the Euler-Lagrange equations [B], 

(T *^(To4 - 9)) +^sgnu* \u* o r 1 = 0, (103) 

which imply the fixed point conditions 

[T*(g-Tu*)] i = |sgn (^K^r 1 , for all i £ Iq. (104) 

For p = 1 one uses results from |21j to conclude that 

ul = S 1/2 K + nvi{g - T u* )), (105) 

where § 7 is the so-called soft-thresholding, defined component-wise § 7 (i>) = (5 7 (uj))j 6 i, 
where 
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(Actually, |21|, Proposition 3.10] only states that any fixed point of (|105p is a minimizer 
of (|102p ; nevertheless the converse also holds, see |25[ Remarks (1), pag. 2515].) The 
fixed-point condition (|105|) implies 



T*(g-Tu*] 
T*(g-Tu*) 



It remains to verify that 

• \u*\ > X'(r,p), if i G I\, and 



€[-1/2,1/2], k*|<l/2 

= l/2sgntt*, 1/2 < |uj| < r. 1 ' 



< F p 1 (\'(r,p)), for p > 1, and |u*| < r — 1/4, for p = 1, if i € Xq- 



We show these conditions for p > 1 only, as the case p = 1 is proved with an analogous 
argument. 

1. We first show that \u*\ > \'(r,p) if i E X\. From the first part of the proof, we 
know that at a minimizer u* , the functional Jr{u*) can be written as 

JP( U *) = \\VHTou* - g)\\l (K) + IKII^ 0(I) + \1i\r p (108) 

Note that at this point we make explicit use of the finite cardinality of X\. Fix 
i € T\ and any perturbation h = hiei, hi S M, along the coordinate i (here, is 
the i th vector of the canonical basis). Consider the rank-one operator ij = TVi, 
where we use V% to denote the orthogonal projection onto the one-dimensional 
subspace spanned by e«. Observe that | tju|| = |(ii)j|||£j||. Since U is orthogonal 
to the argument Vi(TqUq — g) under the £2 penalty in (|108p . the minimality 
condition J p {u*) < J p (u* + h) can be written as 

\\vHTou* - g)\\l {K) + IKII^ (X) + \^\ rP 

< KCTouS-yJIl^ + KIl^^ 

+ llMi||' 2 (T) + min{rP, \u* + h^} + ^({1^ - 1) (109) 
which is equivalent to the condition that 

r p < ||Mi||| 2(X )+minK>l< + ^l P } (HO) 

hold for all hi G R. Now, since ||T|| < 1, it follows that < 1, and (fTTUj) 
implies that 

r p < h\ +min{r p ,\u* + h\ p } (111) 
holds for all hi G M, or, after the change of variables a = u* + hi, that 

r p <(a-u*f +mm{r p ,\a\ p } (112) 



44 



holds for all a£l. In particular, the inequality (|112p must hold at the value a 
that minimizes the right-hand-side. But we already know from Proposition 
that such a minimizer a* is of the form: 

a * = { Fp\K), \u*\<\'(r,p) 

\ it*, \u*\ > X'(r,p) 1 ; 

Now, suppose | it* | < X'(r,p). (We know that \u*\ > r, so then r < \u*\ < X'(r,p)). 
From the proof of Proposition [331 we know that the function F~ 1 (X) is increasing, 
so then a* = F~ l (u*) < Fp 1 (X') < r. Since also S p is strictly increasing, it 
follows that S p (a*) < Sp(F~ 1 (X')) < S P (X') = r p . In the last inequality we 
used ([82]) , (See also Table 1 for recalling the notations used here.) But this is a 
contradiction to the minimality condition, (|112|) . and so we must conclude that 
K*| > X'(r,p). 

2. We now show that \u*\ < F p ' 1 (X' (r,p)), if \u*\ < r. Recall that for i G 2" , the 
coefficient u* satisfies the fixed point condition, 

[ns-Tu^^sgn^Kr 1 . (114) 

Fix i £ Iq, and consider as before any perturbation h = hiCi along the coordinate 
i, hi E R. Let t{ be the rank-one operator as defined before. Then, the minimality 
condition Jr{u*) < Jf(u* + h) is easily seen to be equivalent to 

W Tu * -9\\l 2 (K) + \ u *i\ P ^ W Tu * ~ 9 + hiti\\ 2 l2{K) 

+ min{r p , \u* + hi\ p } 

= \\Tu* - g\\t 2 yq + ll^'*illl 2 (x) + 2hi(ti,Tu* - g) 
+ min{r p , \u* + hi\ p } 

= \\ Tu * ~ 9\\t 2 (K) + IIMi|||(x) - 2h i\ sgnn*|n- 

+ mm{r p ,\u* + hi\ p } (115) 

and the final equality follows directly from the fixed point condition (|114p . Now 
the chain of inequalities (|115p implies the minimality condition 

\u*\ p < WhiUWiw ~ 2h i ^sgau*\u*r 1 +mm{r p ,\u* + h t \ p } 

< -2h i ^sgnu*\u*\ p - 1 +mm{r p ,\u* + hi\ p }, (116) 

or, again using the change of variables a = u* + hi, the inequality 

KT < (a - v*) 2 - 2(a - <)| sgn^KI^ 1 + min{r p , \a\ p }. (117) 

Again, the inequality (|117p should hold for all a by the minimality of u*. Mini- 
mizers a* of the right-hand-side of (|117p also are minimizers of 

(a - {u* + | sgnulKP'- 1 )) 2 + min{rf, |a| p }, (118) 
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which we know to have the form 



a 



vS + 



2 S S n U i I 



IP-1 



\U] 

\u* 



+ 2 L \u*\ p - 1 < \'{r,p) 
^P- 1 > X'(r,p) 



+ 



(119) 



But u* + | sgn it* 1 = F p (u*), so the above reduces to 

_ / «?. ^(O < A'(r,p) 

\ F p (u*), F p (u*)>X'(r,p) {iZU) 

As before, the proof proceeds by contradiction. Suppose that F p (u*) > X'(r,p), 
so that a* = F p (u*) > X'(r,p) and S p (a*) > S P (X') = r p . Note that, by recalling 
F p (u*) = u* - %sgn(u*)(u*) p - 1 , we have 

S p (a*) = (u* - F p (u*)f + \u*\p = \v*]P + ^K| 2p ~ 2 . (121) 

f Plugging a* into the right-hand-side of (|117p . noting that X'(r,p) > r so that 
| a* | > r, and rearranging, yields the inequality 

\u*\ p < r p - ^\u*\ 2p - 2 or S p (a*) = \u*\ p + ^\u*\ 2p - 2 < r p . (122) 
But this contradicts the assumption that the expression in (|12ip be larger than 
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